A Multiclassifier based Document Categorization System: profiting from the Singular Value Decomposition Dimensionality Reduction Technique
نویسندگان
چکیده
In this paper we present a multiclassifier approach for multilabel document classification problems, where a set of k-NN classifiers is used to predict the category of text documents based on different training subsampling databases. These databases are obtained from the original training database by random subsampling. In order to combine the predictions generated by the multiclassifier, Bayesian voting is applied. Through all the classification process, a reduced dimension vector representation obtained by Singular Value Decomposition (SVD) is used for training and testing documents. The good results of our experiments give an indication of the potentiality of the proposed approach.
منابع مشابه
Dimensionality Reduction Aids Term Co-Occurrence Based Multi-Document Summarization
A key task in an extraction system for query-oriented multi-document summarisation, necessary for computing relevance and redundancy, is modelling text semantics. In the Embra system, we use a representation derived from the singular value decomposition of a term co-occurrence matrix. We present methods to show the reliability of performance improvements. We find that Embra performs better with...
متن کاملA multiclass/multilabel document categorization system: Combining multiple classifiers in a reduced dimension
This article presents a multiclassifier approach for multiclass/multilabel document categorization problems. For the categorization process, we use a reduced vector representation obtained by SVD for training and testing documents, and a set of k-NN classifiers to predict the category of test documents; each k-NN classifier uses a reduced database subsampled from the original training database....
متن کاملExploring Basque Document Categorization for Educational Purposes using LSI
In the process of preparing learning material for Computer Supported Learning Systems (CSLSs), one of the first steps involves finding documents relevant to the topics and to the students. This requires documents to be categorized according to some criteria. In this paper we analyze the behaviour of classification techniques such as Naïve Bayes, Winnow, SVMs and k-NN, together with lemmatizatio...
متن کاملDocument Clustering: Before and After the Singular Value Decomposition
Document Clustering is an issue of measuring similarity between documents and grouping similar documents together. Information Retrieval (IR) is an issue of comparing query with a collection of documents to locate a set of documents relevant to a particular query. In the vector space IR model, a query is treated as a document which consists of a few terms. Therefore, in both clustering and retr...
متن کاملSingular Value Decomposition based Steganography Technique for JPEG2000 Compressed Images
In this paper, a steganography technique for JPEG2000 compressed images using singular value decomposition in wavelet transform domain is proposed. In this technique, DWT is applied on the cover image to get wavelet coefficients and SVD is applied on these wavelet coefficients to get the singular values. Then secret data is embedded into these singular values using scaling factor. Different com...
متن کامل